
Understanding Childhood Insomnia: Exploring Contributing Factors and Strategies for Resolution¶
Author: Jose G. Chavez
Introduction¶
Childhood Insomnia is a condition that holds immense significance due to its potential impact on the physical, mental, and emotional well-being of adolescents. As adolescents navigate the critical phase of development, the quality of their sleep plays a pivotal role in shaping their overall health, cognitive function, and emotional resilience. Understanding the factors that contribute to childhood insomnia is not only academically intriguing but also practically essential for devising interventions that promote healthier sleep patterns in this age group.
In this article, we delve into an exploratory analysis of the Adolescent Insomnia Study dataset. This dataset encapsulates a wealth of information regarding demographic, psychometric, clinical, and item-level data collected from a study focused on insomnia in adolescents. By embarking on this analysis, we aim to uncover hidden connections between various psychometric traits, coping mechanisms, and sleep quality.
This article is the first part of a series of articles that will explore the dataset and its potential implications/directions for further study.
I am not a clinician, I am a mathematics PhD with an interest data and in child development and education.
The goal of this series in totality, is to explore the data to uncover interesting potential facts about the connections between various psychometric traits and/or coping skills present in adolescents.
We will be creating multiple notebooks that both explore various aspects of the data and demonstrate how to create models (machine learning and statistical) of various sorts (whatever is most appropriate to the data).
The author found the dataset on Kaggle: https://www.kaggle.com/datasets/utkarshx27/insomnia-symptomatology-in-adolescence
The analysis is divided into several steps, each of which is described in detail in the following sections.
import matplotlib.pyplot as plt
import matplotlib
from contextlib import redirect_stdout
import io
import seaborn as sns
available_styles = plt.style.available
# Create a dummy file-like object to capture output
dummy_output = io.StringIO()
'''
# Use the context manager to redirect output to the dummy file-like object
with redirect_stdout(dummy_output):
for style_name in available_styles:
print(f"{style_name}: plt.style.use('{style_name}')")
plt.style.use('seaborn-darkgrid') # Replace 'seaborn' with your chosen style
'''
plt.rcParams['figure.figsize'] = [3, 2]
sns.set(rc={'figure.figsize':(3,2)})
Contact me:¶
If you have any questions, would like to collaborate and/or have any relevant data you want to share please say hello at jgcblue9558@gmail.com!
Libraries We Need and Importing Data¶
We will be importing:
- pandas: for it's dataframe data tructures and the many tools that ship with it;
- sklearn: a library that has many "models" just waiting to be instantiated and filled with your parameters;
- seaborn: a nice alternative (built on top of to be precise) to the matplotlib library for data visualizations. It has some more modern visual tools and is geared slightly more towards Statistics than matplotlib.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
#importing the model
from sklearn.linear_model import LogisticRegression
#importing tools for splitting the dataset for training and testing
from sklearn.model_selection import train_test_split
#importing tools for evaluating model performance
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns
# Load the data
insomnia_data = pd.read_csv('insomnia_data.csv')
insomnia_item_level_data = pd.read_csv('insomnia_item_level_data.csv')
insomnia_data_dictionary = pd.read_csv('insomnia_data_dictionary.csv')
#Even simpler names for these objects
ind=insomnia_data
ini=insomnia_item_level_data
indict=insomnia_data_dictionary
selected_rows = insomnia_data_dictionary['Columns'].iloc[[3,4,16,17,18]] # Replace with your desired indices
#print(selected_rows)b
pd.set_option('display.max_columns', 10) # Display up to 10 columns without truncation
#display(insomnia_data)
On the CSV Files:¶
You may or may not have familiarity with the way that data collection is typically done in Statistical studies. Here we see a common arrangement for such a "package".
- A csv file with data in it most typical form;
- A csv file with a dictionary for helping the reader understand what the features/items mean;
- A "item-level" csv file where items (referring to questions for a psychological metric questionnaire are provided above the standard headings)
We will work (throughout this notebook at least) on the first two. Our datasets have high number of columns so we really will be forced to work with the dictionary csv. Good practice!
Data Cleaning¶
I remark here that we are quite lucky in that the data set has been "engineered" quite well. It is clean without null values or missing information. So this part of the project will be trivial.
The Data Set .... Through Two Lenses¶
Since this is a notebook within a website dealing with both "Data Analysis" and "Data Science" I want to take a little bit of time to discuss what sorts of methodologies may be suited for and why.
Let's consider both domains:
Data Analysis¶
Recall that Data Analysis is quite related to Statistics in fact it might be considered a superset in some ways and subset in others (doesn't require that one conduct studies for instance). The data we have is not tiny but definitely not large.
We can do things like:
- Create scatter plots
- Calculate Summary Statistics such as: mean, standard deviation, mode, etc;
Pose Questions that could relate to actions/policy changes.
Data Science
Now when one uses the words Data Science one is usually referring to the training of models (programs) capable of ingesting new data and predicting categories and or numerical values (most often one wants to predict labels but strictly speaking the field is not limited to that).
One typically wants a lot of data. That is the main thing to remember when dealing with the question of whether or not you should consider machine learning however how big is a little flexible. Some models perform well with smallish data sets.
Why are there multiple csv files?¶
If this is your firs time looking at Statistics data then its probably your first time seeing this sort of arrangement. Essentially when working with such data sets it is often the case that for readability (and even coding reasons) one often wants to have short encoded titles for columns. Of course one can forget what these columns/features represent! The way around this issues is to include other associated documents (commonly as part of say an excel workbook). That is why we have a file insomnia_data_dictionary.csv.
As for the insomnia_item_level_data.csv, it is comprised of self-reported scores for the metric.
Let's take a look at its contents using the display method
display(insomnia_data_dictionary)
| Columns | Description | |
|---|---|---|
| 0 | Group | INSOMNIA= 1, CONTROL=0 |
| 1 | SubGroup | clean INSOMNIA= 2, subclinical INSOMNIA = 1, C... |
| 2 | Remote | Remote data collection = 1, In person data col... |
| 3 | Sex | MALE = 1, FEMALE = 0 |
| 4 | Age | Years |
| ... | ... | ... |
| 90 | ders_goals | DERS Difficulties Engaging in Goal-Directed Be... |
| 91 | ders_impulse | DERS Impulse control difficulties (IMPULSE) (D... |
| 92 | ders_awareness | DERS Lack of Emotional Awareness (AWARENESS) (... |
| 93 | ders_strategies | DERS Limited Access to Emotion Regulation Stra... |
| 94 | ders_clarity | DERS Lack of Emotional Clarity (CLARITY) (Diff... |
95 rows × 2 columns
Now let's take a look at the insomnia data
display(insomnia_data)
| ID | Group | SubGroup | Remote | Sex | ... | Zders_impulse | Zders_awareness | Zders_strategies | Zders_clarity | ZDERS_total | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | sub_001 | 0 | 0 | 0 | 0 | ... | -0.114565 | 1.083087 | -0.656051 | 0.538016 | 0.575806 |
| 1 | sub_002 | 0 | 0 | 0 | 0 | ... | -0.114565 | 1.083087 | -0.656051 | 0.538016 | 0.153943 |
| 2 | sub_003 | 0 | 0 | 0 | 1 | ... | -0.425527 | 0.271626 | -0.656051 | -0.260601 | -0.619473 |
| 3 | sub_004 | 0 | 0 | 0 | 0 | ... | -0.114565 | 0.596210 | -0.116442 | 0.538016 | 0.224254 |
| 4 | sub_005 | 1 | 2 | 0 | 1 | ... | 0.196397 | 0.109334 | 0.153363 | 1.336632 | 0.857049 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 90 | sub_091 | 1 | 1 | 1 | 1 | ... | -0.736489 | 0.109334 | -0.925856 | 0.937324 | 0.294564 |
| 91 | sub_092 | 1 | 2 | 1 | 0 | ... | 0.818321 | -0.702127 | -0.656051 | 1.735941 | 0.013322 |
| 92 | sub_093 | 1 | 2 | 1 | 1 | ... | -0.736489 | 0.920795 | 0.153363 | 2.135249 | 1.208601 |
| 93 | sub_094 | 1 | 2 | 1 | 0 | ... | -1.358413 | -2.162757 | -1.735270 | -2.656452 | -2.939721 |
| 94 | sub_095 | 1 | 1 | 1 | 0 | ... | 2.373131 | -1.675881 | 2.581605 | -1.059218 | 0.997670 |
95 rows × 174 columns
As you can see the dictionary csv file explains the labels in the insomnia csv. This is common practice when labels are too long to include in the dataframes' labels proper.
Exploring the Dataset¶
First let's taking a look at the dataset using Panda's head function which allows us to see a manageable amount of the data.
print(insomnia_data.head(1))
ID Group SubGroup Remote Sex ... Zders_impulse Zders_awareness \ 0 sub_001 0 0 0 0 ... -0.114565 1.083087 Zders_strategies Zders_clarity ZDERS_total 0 -0.656051 0.538016 0.575806 [1 rows x 174 columns]
Investigating What the Labels Mean¶
Recall that we can use the insomnia_data_dictionary to understand what the labels mean.
# Display the first few rows of each dataframe to understand the structure of the data
print('Insomnia Data:')
display(insomnia_data.head())
print('\nInsomnia Item Level Data:')
display(insomnia_item_level_data.head())
print('\nInsomnia Data Dictionary:')
display(insomnia_data_dictionary.head())
Insomnia Data:
| ID | Group | SubGroup | Remote | Sex | ... | Zders_impulse | Zders_awareness | Zders_strategies | Zders_clarity | ZDERS_total | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | sub_001 | 0 | 0 | 0 | 0 | ... | -0.114565 | 1.083087 | -0.656051 | 0.538016 | 0.575806 |
| 1 | sub_002 | 0 | 0 | 0 | 0 | ... | -0.114565 | 1.083087 | -0.656051 | 0.538016 | 0.153943 |
| 2 | sub_003 | 0 | 0 | 0 | 1 | ... | -0.425527 | 0.271626 | -0.656051 | -0.260601 | -0.619473 |
| 3 | sub_004 | 0 | 0 | 0 | 0 | ... | -0.114565 | 0.596210 | -0.116442 | 0.538016 | 0.224254 |
| 4 | sub_005 | 1 | 2 | 0 | 1 | ... | 0.196397 | 0.109334 | 0.153363 | 1.336632 | 0.857049 |
5 rows × 174 columns
Insomnia Item Level Data:
| Unnamed: 0 | Pittsburgh Sleep Quality Index (PSQI) | Pittsburgh Sleep Quality Index (PSQI).1 | Pittsburgh Sleep Quality Index (PSQI).2 | Pittsburgh Sleep Quality Index (PSQI).3 | ... | Race, Ethnicity & Sex.5 | Race, Ethnicity & Sex.6 | Race, Ethnicity & Sex.7 | Race, Ethnicity & Sex.8 | Race, Ethnicity & Sex.9 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NaN | During the past month, what time have you usua... | During the past month, how long (in minutes) h... | During the past month, what time have you usua... | How much time (in minutes) do you usually spen... | ... | Race (choice=<strong>Unknown / Not Reported</s... | Ethnicity (choice=<strong>Hispanic or Latino</... | Ethnicity (choice=<strong>NOT Hispanic or Lati... | Ethnicity (choice=<strong>Unknown / Not Report... | Gender (0=Female, 1=Male) |
| 1 | Record ID | psqi1 | psqi2 | psqi3 | psqi_4a | ... | race___5 | ethnicity___0 | ethnicity___1 | ethnicity___2 | Sex |
| 2 | sub_001 | 2200 | 10 | 700 | 15 | ... | 0 | 0 | 1 | 0 | 0 |
| 3 | sub_002 | 2200 | 10 | 500 | 10 | ... | 0 | 0 | 1 | 0 | 0 |
| 4 | sub_003 | 2300 | 45 | 730 | 0 | ... | 0 | 0 | 1 | 0 | 1 |
5 rows × 471 columns
Insomnia Data Dictionary:
| Columns | Description | |
|---|---|---|
| 0 | Group | INSOMNIA= 1, CONTROL=0 |
| 1 | SubGroup | clean INSOMNIA= 2, subclinical INSOMNIA = 1, C... |
| 2 | Remote | Remote data collection = 1, In person data col... |
| 3 | Sex | MALE = 1, FEMALE = 0 |
| 4 | Age | Years |
Brief Aside on Z-scores:¶
Z-scores, also known as standard scores, are a statistical measure that quantifies how far a particular data point is from the mean of a dataset when measured in terms of standard deviations. They are used to standardize data and allow comparisons between data points that may have different units or scales. Z-scores are calculated using the formula:
$$ Z = \frac{x - \mu}{\sigma} $$Where:
- $ Z $ is the z-score.
- $ x $ is the individual data point.
- $ \mu $ is the mean of the dataset.
- $\sigma $ is the standard deviation of the dataset.
Here's what z-scores mean:
Significance and Direction: The sign of the z-score (+ or -) indicates whether the data point is above or below the mean, respectively. Positive z-scores indicate data points above the mean, while negative z-scores indicate data points below the mean.
Magnitude: The magnitude of the z-score indicates how many standard deviations the data point is from the mean. A larger absolute z-score implies that the data point is farther from the mean in terms of standard deviations.
Comparison: Z-scores allow you to compare data points from different distributions. By standardizing data, you can assess how unusual or typical a particular data point is within its distribution.
Outliers: Data points with z-scores significantly higher or lower than a certain threshold (usually around ±2 or ±3) are often considered outliers, as they deviate substantially from the mean.
Normalization: Z-scores standardize data, making it easier to analyze and compare data with different units and scales.
Z-scores and our Data¶
Recall that there are many columns whose labels are pre-pended with "Z". These column values are the z-scores of the participant with respect to that trait. For example if the subject has a z-score of 0 for label ders_total, which recall refers to DERS total score (Difficulties in Emotion Regulation Scale), then that person would have had the average/mean value for that particular psychometric.
## Gathering Column Names
#column_names_tuple = tuple(insomnia_data.columns)
#print(column_names_tuple)
#print(len(column_names_tuple))
A Look at Some Correlations for All Races/Ethnicities¶
target_column = 'ISI_total';
# Calculate correlations with the chosen column
correlations = insomnia_data.corr(numeric_only=True)[target_column].sort_values(ascending=False);
pd.set_option('display.max_rows', 50);
print(correlations.head(10));
ISI_total 1.000000 ZISI_total 1.000000 ZPSQI_total 0.708189 PSQI_total 0.708189 SubGroup 0.692995 ZGCTI_total 0.619535 GCTI_total 0.619535 GCTI_anxiety 0.617297 ZGCTI_anxiety 0.617297 Group 0.615157 Name: ISI_total, dtype: float64
correlations_df = pd.DataFrame(correlations)
# Reset the index and add a column for variable names
correlations_df.reset_index(inplace=True)
correlations_df.columns = ['Variable', 'Correlation']
# Print the DataFrame with multiple columns
#print(correlations_df)
pd.set_option('display.max_columns', None)
df_t = correlations_df.T
display(df_t)
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Variable | ISI_total | ZISI_total | ZPSQI_total | PSQI_total | SubGroup | ZGCTI_total | GCTI_total | GCTI_anxiety | ZGCTI_anxiety | Group | BDI_total | ZBDI_total | FIRST_total | ZFIRST_total | GCTI_reflection | ZGCTI_reflection | NEO_agreeableness | ZNEO_agreeableness | GCTI_worries | ZGCTI_worries | GCTI_negativeAffect | ZGCTI_negativeAffect | Remote | asq_school | Zasq_school | asq_future | Zasq_future | ZGCTI_thoughts | GCTI_thoughts | casq_sleepy | Zcasq_sleepy | ZTCQIR_worry | TCQIR_worry | Zasq_leisure | asq_leisure | casq_alert | Zcasq_alert | Zasq_attendance | asq_attendance | asq_teacher | Zasq_teacher | ZTCQI_R_Total | TCQI_R_Total | TCQIR_social_avoidance | ZTCQIR_social_avoidance | TCQIR_behavtioral_distraction | ZTCQIR_behavtioral_distraction | NotHispanic | ZTCQIR_reappraisal | TCQIR_reappraisal | asq_peer | Zasq_peer | asq_romantic | Zasq_romantic | NEO_neuroticism | ZNEO_neuroticism | cope_disengage_su | Zcope_socialsupp_instr | cope_socialsupp_instr | Zcope_planning | cope_planning | asq_home | Zasq_home | Zcope_emotions | cope_emotions | Zders_goals | ders_goals | PSRS_RWO | ZPSRS_RWO | NEO_Conscientiousness | ZNEO_Conscientiousness | cope_active | Zcope_active | Zcope_disengage_su | ASHS_substances | Zcope_acccept | cope_acccept | PSRS_RSE | ZPSRS_RSE | ZNEO_openness | NEO_openness | cope_socialsupp_emo | Zcope_socialsupp_emo | ders_impulse | cope_disengage_mental | Zcope_disengage_mental | cope_humor | Zcope_humor | Zcope_religion | casq_total | Zcasq_total | Zders_impulse | ZASHS_substances | ASHS_bedtimeRoutine | ZASHS_bedtimeRoutine | ZNEO_extraversion | NEO_extraversion | Zasq_responsibility | American_Indian | Asian | White | ZTCQIR_Aggressive_supression | TCQIR_Aggressive_supression | Native_Hawaiian | cope_denial | Zcope_denial | ders_nonaccpetance | Zders_nonaccpetance | cope_growth | Zcope_growth | Zders_strategies | ders_strategies | Zcope_disengage_emo | cope_disengage_emo | cope_suppression | Zcope_suppression | asq_finance | PSRS_total | ZPSRS_total | cope_restraint | Zcope_restraint | Zasq_finance | Black | ASHS_BedroomSharing | ZDERS_total | ders_total | asq_responsibility | MEQr_total | ZMEQr_total | PSRS_RSC | ZPSRS_RSC | TCQIR_cognitive_distraction | ZTCQIR_cognitive_distraction | ACE_tot | ZASHS_BedroomSharing | PSRS_PrR | ZPSRS_PrR | ZACE_tot | STAI_Y_total | ZSTAI_Y_total | Zders_clarity | ders_clarity | PSS_total | ZPSS_total | Sex | ZASHS_SleepEnvirnmont | ASHS_SleepEnvirnmont | PSRS_FRa | ZPSRS_FRa | PDS_FEMALE | ZASHS_sleepStability | ASHS_sleepStability | ders_awareness | Zders_awareness | ASHS_DaytimeSleep | ZASHS_DaytimeSleep | Hispanic | Age | PDS_MALE | ASHS_emotional | ZASHS_emotional | ASHS_total | ZASHS_total | ZASHS_cognitive | ASHS_cognitive | unknown_Race | unknown_Etnicity | Unnamed: 95 |
| Correlation | 1.0 | 1.0 | 0.708189 | 0.708189 | 0.692995 | 0.619535 | 0.619535 | 0.617297 | 0.617297 | 0.615157 | 0.596929 | 0.596929 | 0.523398 | 0.523398 | 0.509471 | 0.509471 | 0.465616 | 0.465616 | 0.459035 | 0.459035 | 0.458402 | 0.458402 | 0.453848 | 0.407773 | 0.407773 | 0.38315 | 0.38315 | 0.352051 | 0.352051 | 0.316141 | 0.316141 | 0.30236 | 0.30236 | 0.301567 | 0.301567 | 0.292942 | 0.292942 | 0.286191 | 0.286191 | 0.28059 | 0.28059 | 0.275811 | 0.275811 | 0.222331 | 0.222331 | 0.220245 | 0.220245 | 0.21373 | 0.19261 | 0.19261 | 0.182992 | 0.182992 | 0.181919 | 0.181919 | 0.169643 | 0.169643 | 0.168497 | 0.165903 | 0.165903 | 0.164836 | 0.164836 | 0.162166 | 0.162166 | 0.155069 | 0.155069 | 0.125583 | 0.125583 | 0.120577 | 0.120577 | 0.116151 | 0.116151 | 0.108338 | 0.108338 | 0.104242 | 0.103794 | 0.100861 | 0.100861 | 0.093433 | 0.093433 | 0.092483 | 0.092483 | 0.09132 | 0.09132 | 0.085062 | 0.07301 | 0.07301 | 0.065751 | 0.065751 | 0.062734 | 0.062725 | 0.062725 | 0.058432 | 0.050481 | 0.040162 | 0.040162 | 0.039893 | 0.039893 | 0.037892 | 0.032293 | 0.029444 | 0.028738 | 0.026792 | 0.026792 | 0.022713 | 0.019374 | 0.019374 | 0.017677 | 0.017677 | 0.017521 | 0.017521 | 0.012895 | 0.012895 | 0.008314 | 0.008314 | 0.006285 | 0.006285 | 0.002789 | -0.00443 | -0.00443 | -0.013747 | -0.013747 | -0.03327 | -0.040749 | -0.050997 | -0.063225 | -0.063225 | -0.065147 | -0.069157 | -0.069157 | -0.072064 | -0.072064 | -0.075808 | -0.075808 | -0.079921 | -0.082231 | -0.085338 | -0.085338 | -0.088105 | -0.102072 | -0.102072 | -0.106103 | -0.106103 | -0.11328 | -0.11328 | -0.11725 | -0.125915 | -0.125915 | -0.16801 | -0.16801 | -0.18813 | -0.223113 | -0.223113 | -0.224974 | -0.224974 | -0.227422 | -0.227422 | -0.235089 | -0.291307 | -0.339504 | -0.374654 | -0.37492 | -0.416037 | -0.416075 | -0.452816 | -0.452829 | NaN | NaN | NaN |
Highest Positive Correlations with ISI_total Identified for Total Population¶
We see that these are:
- ZPSQI_total:Zscore for PSQI_total,PSQI total (Pittsburgh sleep quality index )
- PSQI_total:PSQI_total,PSQI total (Pittsburgh sleep quality index )
- SubGroup (the other group)
- ZGCTI_total
- GCTI_total
- GCTI_anxiety
ZGCTI_anxiety
4-7 all being related to anxiety tell us that the it's reasonable to guess that anxiety is the number one cause.
As we can see (trivially) the ISI_total column and ZISIT_total column give a perfect correlation. More interestingly is that PSQI
Finding Negative or Lesser Correlations¶
As I pondered the data I wondered if any of these other factors that seems associated to "coping mechanisms" or other factors that might help alleviate insomnia or general mood correlate negatively. If they do then one might recommend those to patients.
correlations = insomnia_data.corr(numeric_only=True)[target_column].sort_values(ascending=True)
display(correlations)
ASHS_cognitive -0.452829
ZASHS_cognitive -0.452816
ZASHS_total -0.416075
ASHS_total -0.416037
ZASHS_emotional -0.374920
...
ZISI_total 1.000000
ISI_total 1.000000
unknown_Race NaN
unknown_Etnicity NaN
Unnamed: 95 NaN
Name: ISI_total, Length: 168, dtype: float64
Unsurprisingly we see members of the ASH metrics showing up. What's interesting is the relative (perhaps) importance of them in helping alleviate insomnia.
It is the cognitive "Adolescent Sleep Hygiene Scale" that correlate most negatively with insomnia.
Visualizing Correlations¶
#Let's select some interesting features
features_study1 =['Age','ISI_total','ASHS_total','PSQI_total']
subdf_one_1=insomnia_data[features_study1]
#plt.figure(figsize=(3, 2)) # Width: 8 inches, Height: 6 inches
sns.pairplot(subdf_one_1, height=2, aspect =1)
plt.show()
There seems to be some correlation between Age and ders_awareness. I also see something of a linear correlation perhaps, this time between ASHS_total and ISI_total. What's interesting is that in the latter case it seems to correlate negatively.
Comparing Ethnicities¶
Let's take a look at the potential correlations between being Hispanic and the various disorders. First we take a look at females (indicated by a 0 in the sex column of the dataframe).
## Starbucks
#width=2
#height=1
columns_hispanic_1 =['Age','ISI_total','ASHS_total','PSQI_total']
#matplotlib.rcParams['figure.figsize'] = [width, height]
#plt.rcParams['figure.dpi'] = 200
# Get only entries where hispanic is set to 1
sns.set(style="whitegrid")
subdf = insomnia_data[insomnia_data['Hispanic'] == 0]
subdf_hisp_1=subdf[columns_hispanic_1]
subdf_hisp_1.head()
sns.pairplot(subdf_hisp_1, height=2, aspect =1)
plt.show()
Next let's take a look at males.
columns_hispanic_1 =['Age','ISI_total','ASHS_total','PSQI_total']
subdf= insomnia_data[(insomnia_data['Hispanic']==1) & (insomnia_data['Sex'] ==1)]
subdf_hispanic_1=insomnia_data[columns_hispanic_1]
subdf_hispanic_1.head()
sns.set(style="whitegrid")
sns.set_context("notebook", rc={"figure.figsize": (3, 2)}) # Set the desired width and height
sns.pairplot(subdf_hispanic_1, height=2, aspect=1)
plt.show()
Observations
It looks like for both genders there's something of a correlation between Age and ders_awareness.
That could imply that for either Hispanics or people in general there's some measurable increase in this features as one ages. Let's continue looking at these features in this way but this time for Asian populations.
columns_asian_1 =['Age','ISI_total','ASHS_total','PSQI_total']
subdf = insomnia_data[insomnia_data['Asian'] == 1]
subdf_asian_1=insomnia_data[columns_asian_1]
subdf_hisp_1.head()
sns.pairplot(subdf_asian_1, height=2, aspect=1)
plt.show()
Asian Scores Based on Gender¶
Males
### Breaking Down With Respect to Gender
#selected_rows = df[(df['Age'] > 25) & (df['Score'] >= 80)]
## Starbucks
columns_asian_1=['Age','ASHS_total','ISI_total','ders_awareness']
subdf= insomnia_data[(insomnia_data['Asian']==1) & (insomnia_data['Sex'] ==1)]
subdf_asian_1=insomnia_data[columns_asian_1]
subdf_hisp_1.head()
sns.pairplot(subdf_asian_1, height=2, aspect=1)
plt.show()
Females
columns_asian_1=['Age','ASHS_total','ISI_total','ders_awareness']
subdf=insomnia_data[(insomnia_data['Asian']==1) & (insomnia_data['Sex'] ==0)]
subdf_asian_1=insomnia_data[columns_asian_1]
subdf_hisp_1.head()
sns.pairplot(subdf_asian_1, height=2, aspect=1)
plt.show()
## Plotting Together:
columns_asian_1=['Sex','Age','ASHS_total','ISI_total','ders_awareness']
subdfAsian= insomnia_data[(insomnia_data['Asian']==1)]
subdfAsian = subdfAsian[columns_asian_1]
print(subdf_one_1['Age'])
sns.scatterplot(data=subdfAsian, x='Age', y='ISI_total', hue='Sex', style='Sex')
plt.title('Scatter Plot with Different Colors and Markers')
plt.xlabel('Age')
plt.ylabel('ISI_total')
plt.legend(title='Category')
plt.show()
0 19.3
1 19.3
2 18.8
3 18.8
4 19.6
...
90 16.9
91 16.6
92 17.3
93 16.8
94 16.7
Name: Age, Length: 95, dtype: float64
correlation_matrix = subdfAsian.corr()
print(correlation_matrix)
Sex Age ASHS_total ISI_total ders_awareness Sex 1.000000 -0.277602 0.027418 -0.336290 0.106390 Age -0.277602 1.000000 0.061980 -0.283078 0.362403 ASHS_total 0.027418 0.061980 1.000000 -0.456341 -0.120614 ISI_total -0.336290 -0.283078 -0.456341 1.000000 -0.109638 ders_awareness 0.106390 0.362403 -0.120614 -0.109638 1.000000
We seem to see something like a normal distribution at the left bottom corner.
Conclusions and Future Directions¶
Conclusions Conclusions: Our analysis of the Adolescent Insomnia Study dataset has brought to light intriguing correlations that provide a glimpse into the intricate landscape of childhood insomnia. Notably, the strong correlation between anxiety and insomnia severity underscores the interplay between emotional well-being and sleep quality in adolescents. This finding accentuates the need for a holistic approach to addressing insomnia issues, one that encompasses both psychological and physiological factors.
Equally noteworthy are the negative correlations observed between cognitive tools and reduced insomnia severity. This revelation emphasizes the potential efficacy of cognitive interventions in managing insomnia among adolescents. By incorporating cognitive strategies into educational programs or therapeutic approaches, we might empower adolescents with practical tools to enhance their sleep quality and overall well-being.
Future Directions: Our exploration, though illuminating, merely scratches the surface of the rich dataset at hand, which encompasses a comprehensive array of 200 features. This vast dataset invites us to embark on a journey of deeper understanding and multidimensional exploration:
Feature Analysis: The dataset's breadth provides a unique opportunity to conduct an in-depth analysis of other features beyond the ones we explored. By systematically examining each feature's correlation with insomnia severity and other relevant metrics, we can uncover additional factors that might contribute to or mitigate childhood insomnia.
Feature Engineering: Beyond simple correlations, feature engineering techniques can help unveil intricate patterns and interactions within the data. Techniques such as principal component analysis (PCA) or dimensionality reduction can unveil hidden relationships that might not be evident at first glance.
Clustering and Subgroup Analysis: Employing clustering algorithms, we can identify subgroups within the dataset that exhibit distinct sleep patterns, coping mechanisms, and psychometric traits. This could lead to the discovery of novel sleep-related profiles and help tailor interventions to specific groups.
Time Series Analysis: If the dataset includes temporal information, time series analysis can provide insights into the evolution of sleep patterns and psychometric traits over time. This approach is especially valuable for understanding the dynamic nature of childhood insomnia.
Predictive Modeling: Leveraging machine learning, we can develop predictive models that anticipate insomnia severity based on a combination of features. This predictive capability could inform early interventions and personalized strategies.
Ethnic and Demographic Variations: Given the diverse nature of the dataset, exploring how various features interact with ethnicity, socioeconomic status, and other demographic factors could unveil unique insights into how different populations experience childhood insomnia.
In essence, while our current analysis paints a compelling picture, the true potential of this dataset lies in its depth and diversity. By embracing the complexity of this dataset, researchers can unravel the multifaceted nature of childhood insomnia and contribute to informed interventions that cater to diverse needs.
In summary, as we journey forward, let's recognize that the exploration of this dataset is an ongoing endeavor. Its richness holds the promise of unearthing countless insights that could revolutionize our understanding of childhood insomnia and its management. By embracing the challenge and harnessing the power of advanced analytical techniques, we can make substantial strides in improving the sleep quality and well-being of adolescents worldwide.
In Part 2 we will explore:
- Linear Regressions and other models for predicting ISI_total
- Is there a peak at around 18 years old? Is it true for all age groups?
- Which groups of people were most represented and what implications might that have for the study?